Experiments on a parametric nonlinear spectral warping for an HMM-based speech recognizer
نویسنده
چکیده
This paper is concerned with the search for an optimal feature-set for a speech recognition system. A better acoustic feature analysis that suitably enhances the semantic information in a consistent fashion can reduce raw-score (no grammar) error rate sig-niicantly. A simple two-dimensional parameterized feature set is proposed. The feature-set is compared against a standard mel-cepstrum, LPC-based feature-set in talker-independent, connected-alphadigit HMM-based rec-ognizer. The results show that a particular combination of parameters yields a signiicantly lower error rate than the baseline mel-cepstrum LPC-based feature-set.
منابع مشابه
Robust speech recognition and feature extraction using HMM2
This paper presents the theoretical basis and preliminary experimental results of a new HMM model, referred to as HMM2, which can be considered as a mixture of HMMs. In this new model, the emission probabilities of the temporal (primary) HMM are estimated through secondary, state specific, HMMs working in the acoustic feature space. Thus, while the primary HMM is performing the usual time warpi...
متن کاملLSP weighting functions based on spectral sensitivity and mel-frequency warping for speech recognition in digital communication
In digital communication networks, a speech recognition system extracts feature parameters after reconstructing speech signals. In this paper, we consider a useful approach of incorporating speech coding parameters into a speech recognizer. Most speech coders employ line spectrum pairs (LSPs) to represent spectral parameters. We introduce weighted distance measures to improve the recognition pe...
متن کاملMapping frames with DNN-HMM recognizer for non-parallel voice conversion
To convert one speaker’s voice to another’s, the mapping of the corresponding speech segments from source speaker to target speaker must be obtained first. In parallel voice conversion, normally dynamic time warping (DTW) method is used to align signals of source and target voices. However, for conversion between non-parallel speech data, the DTW based mapping method does not work. In this pape...
متن کاملFormant-based frequency warping for improving speaker adaptation in HMM TTS
Vocal Tract Length Normalization (VLTN), usually implemented as a frequency warping procedure (e.g. bilinear transformation), has been used successfully to adapt the spectral characteristics to a target speaker in speech recognition. In this study we exploit the same concept of frequency warping but concentrate explicitly on mapping the first four formant frequencies of 5 long vowels from sourc...
متن کاملتخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت
The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996